A note on stochastic context-free grammars, termination and the EM-algorithm
نویسنده
چکیده
Termination of a stochastic context-free grammar, i.e. almost sure finiteness of the random trees it produces, is shown to be equivalent to extinction of an embedded multitype branching process. We show that the maximum likelihood estimator in a saturated model based on complete or partial observation of a finite tree always gives terminating grammars. With partial observation we show that this in fact holds for the whole sequence of parameters obtained by the EM-algorithm. Finally, aspects of the size of the tree related to the embedded branching process is discussed.
منابع مشابه
RNA Structure Prediction Including Pseudoknots Based on Stochastic Multiple Context-Free Grammar
Several grammars have been proposed for modeling RNA pseudoknotted structure. In this paper, we focus on multiple contextfree grammars (MCFGs), which are natural extension of context-free grammars and can represent pseudoknots, and extend a specific subclass of MCFGs to a probabilistic model called SMCFG. We present a polynomial time parsing algorithm for finding the most probable derivation tr...
متن کاملThe Inside-Outside Algorithm
This note describes the inside-outside algorithm. The inside-outside algorithm has very important applications to statistical models based on context-free grammars. In particular, it is used in EM estimation of probabilistic context-free grammars, and it is used in estimation of discriminative models for context-free parsing. As we will see, the inside-outside algorithm has many similarities to...
متن کاملUsing evolutionary Expectation Maximization to estimate indel rates
MOTIVATION The Expectation Maximization (EM) algorithm, in the form of the Baum-Welch algorithm (for hidden Markov models) or the Inside-Outside algorithm (for stochastic context-free grammars), is a powerful way to estimate the parameters of stochastic grammars for biological sequence analysis. To use this algorithm for multiple-sequence evolutionary modelling, it would be useful to apply the ...
متن کاملRecent Methods for RNA Modeling Using Stochastic Context-Free Grammars
Stochastic context-free grammars (SCFGs) can be applied to the problems of folding, aligning and modeling families of homologous RNA sequences. SCFGs capture the sequences' common primary and secondary structure and generalize the hidden Markov models (HMMs) used in related work on protein and DNA. This paper discusses our new algorithm, Tree-Grammar EM, for deducing SCFG parameters automatical...
متن کاملLecture 21: Spectral Learning for Graphical Models
In modern machine learning, latent variables are often introduced into the models to endow them with learnable and interpretable structures. Examples of such models include various state space models of sequential data (such as hidden Markov models), mixed membership models (such as topic models), and stochastic grammars (such as probabilistic context free grammars) used to model grammatical st...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005